(1901.05350) TensorFlow.js: Machine Learning for the Web and Beyond

benefits

data privacy

accessibility

low-latency in interactive applications

Challenge

Performance

Cross browser compatibility

WebGL 2.0 > 1.0

chrome, firefoxは2.0サポート

safariは1.0 -> webgpuへfocus (?

Single-threaded execution

Opportunities in a browser-based environment

Shareability

without any additional installations

educational use casesに最適

ここで言っている教育、はmachine learning自体をインタラクティブにして学習させる、ということかも

Interactivity

user-centric ML application

On-device computation

camera, mic, accelerometerなどにstandardized accessできて、ML x sensorをeasy integration

user data can stay on-device, preserve user-privacy

medical, accessibility, personalized

言語が不自由なユーザーが、自身の音声サンプルでブラウザ上でpersonalize舌モデルをつくれる

Web x ML#61957d8fb7012000002af12a

Federated Learning でsensitive data 機密データをデバイスに残したまま、学習ができる

Related work

ConvNetJS / karpathy 2014なd

plain jsだったぽい

あとはメンテナンスされてない

WebDNN

webgpu

これじたいappleがpropose

Design and API

bring ML to the JS ecosystem

MLの経験がない人でも

逆にML経験がある人がJSに移行

ease-of-use > performanceなのでeagerな自動微分

Asynchronous execution

JSはシングルスレッド、レイアウトやイベント処理とスレッドが共有

長時間実行されるタスクは、UIを悪化

そこで、イベントのコールバックやプロミス

tf.jsは、intutive API 直感的をゴールにするために、sync/asyncのバランスを取るようなAPIに

tensor.dataSync()はblock

tensor.data()はnon-block

promiseを返す

WebGLはGCがないので、tf.tidyを提供

wrapするだけで、中間tensorも削除してくれる

tf.timeやtf.profileが便利

ピークメモリ使用量を取れる

Performance

サーバーサイドでは、TensorFlow C APIをbindし、JSでなくネイティブでやる。CUDAも使える

WebGL utilization

packing

RGBAの4chすべてのfloatをhogehogeすると1.3-1.4x ???

WebGLとCUDAで3-10 gapがあるので、最適化していく。

gapは、WebGLにwork group, shared memoryなどGPGPUであるものがないため

そこでWebGPU

Implementation

WebGLはOpenCLやCUDAと異なり、GPGPUを明示的にサポートしていないOpenGL ESに基づく

GPGPUContextを開発

WebGLの複雑さや制約を回避

fragment shaderを使う

本来、ピクセルの色

ピクセルごとに独立して並列に実効

これを利用して行列演算の並列化

draw pipeline

A+B=Cの例

pixelごとに並列化。RGBAのRだけ使う。

GBAもメモリが確保されるが、WebGL 2ではgl.R32FでRだけにできる

将来的には、すべてのチャネルを利用して、GPUのsampler cacheを有効活用したい

shader compiler

GLSLはデバッグが大変なので

2Dテクスチャや物理的なshapeから、logicalに実装できるようにした

背後では、Safari, Chromeを抽象化

1x3x1x2のtensor -> backendでは3x2のtexture (2D)にする

1.3x

chromeでは32bit single channel float texture

safariでは16bit single-channel float texture

Async in WebGL

GPUが動いているときはCPUはfree

However, to re-trieve the underlying data of a texture, the WebGL API only provides a blocking gl.readPixels() method. To get around this limitation, we approximate when the GPU is done exe-cuting the operations, postponing the call to gl.readPixels(), which releases the main thread in the meantime.

gl.readPixelsで計算結果を取得したいがblockされるので、GPUがいつopを完了したかを近似して、実際のgl.readPixelsの呼び出しはpostponeしている

中間結果はreadPixels不要なので、最後の結果だけか。

たぶん、promiseをawaitしたときに初めてreadPixelsするのかな

WebGLのtextureのdispose, re-allocateはコスト高いので、tensorがdisposeされてもrelease memoryはしない

同じshapeのtensorで、そのtextureを再利用できる

MLだと同じshapeは多いので、かなり貢献する

Specifically, we automatically page WebGL textures to the CPU when the total amount of GPU memory allocated exceeds a thresh-old which can be estimated from the screen size.

これ重そう

tf.tidyやtf.disposeを使っている場合はpagingされない

https://webglreport.com/?v=1

古いAndroidはGPU積んでないものも

iOSはfp16のみなので、f32が指定されてもfallback

など、webglはstableでない

Node

N-APIでTensorFlow C APIをbind

AVX, CUDA

TPUさえも今後はサポートできる

Future backend

Wasm

vanilla JSより速い

SIMDも今後

WebGPU

Model Converter

training opを削除

4Mbにまとめて、auto cachingに最適化

JSだと簡単にリソースを配信できるので、GCSでweightを配信して

beginnerがappにmodelをintegrateしやすい

our core goal of enabiling ML beginners

wrapper API

tensorをユーザーから隠す。DOMやprimitive arraysがinput, outputもjs object

tensorを直接扱うのはexpert users向け

transfer learningでpersonalization

Example

easy to get started with

特別なinstallなしに、MLの教育ができる

Teachable Machine

GAN Lab

ML5

friendly ML for the web

artistやcreative coder

Deep Learning Practicum MIT

Many people who had a passing interest in ML, but found it difficult to access, use Tensor-Flow.js as an opportunity to learn about the new technology with fewer barriers to entry.

参入障壁の低い新技術を学ぶ機会として

Gestural Interfaces

sign language to speech

pose detection

Research Dissemination

https://worldmodels.github.io/

Numeric Applications

GPUの数値計算ツールとして使う

https://github.com/tensorflow/tfjs-tsne

Desktop and Production Applications

https://github.com/clinicjs/node-clinic

Clinic.js Doctor just got more advanced with TensorFlow.js - Blog - Clinic.js

CPUのスパイクはユーザー起因か、nodeのGCか、などの分類

https://mood.gg/ow